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1 Disc - An Interactive Classification of Web Documents by Self-Organizing Maps and 
Search Engines 

Size: 5.70KB I^IME type: text/html 


When we use search engines to retrieve web documents, we get a lot of answers as 
ever before, so we have a lot of labors to examine each web document. Furthermore, 
we cannot retrieve pertinent web documents by conventional search engines when a 
specific topic is described by more than one web document. In this system, web 
documents are automatically clustered by their feature vectors produced from web 
documents or minimal subgraphs consisting of multiple web documents, and their 
overview ... 


The classification is no longer used as of January 1998, but the item is still searchable 
for previously classified documents. The classification is no longer used as of January 
1991, but the item is still searchable for previously classified documents. 


2 IPC Taxonomy The International Patent Classification (IPC) is a complex hierarchical 
classification system comprising sections, classes, subclasses and groups. 2.1 
Document collection In order to perform automated patent categorization, we have 
collected a large database of suitable patent documents that we name WIPO- alpha. 
Classes with the largest numbers of training documents are labeled, as are those with 
many more test documents than training documents. 


The learner works on the training documents, represented according to the feature set 
learned, to induce the classification model. Learning Kernels When a learner obtains a 
feature set, the set of training documents is represented relative to that feature set, 
and then the learner applies its inductive algorithm to learn a classification model. Since 
there could be four kinds of redundant information in documents, the system can run 
four learners: the abstract/ meta information learner, ... 


Copyright 2005, by the Association for Computing Machinery, Inc. Permission to make 
digital or hard copies of part or all of this work for personal or classroom use is granted 


2 http://www.acm.org/clas$/1 998/acmccs98-1 .2.3.xml 
Size: 125.43KB MIME type: text/xml 


3 Automated categorization in the IPC 

Size: 436.24KB MIME type: application/pdf 


4 MultiStratSIGKDD format revPDF 

Size: 338.63KB MIME type: application/pdf 


5 1998 ACM Computing Classification System: 
Size: 80.85KB MIME type: text/html 
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without fee provided that copies are not made or distributed for profit or commercial 
advantage and tinat copies bear this notice and the full citation on the first page. To 
copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior 
specific permission and/or a fee. Request permission... 

6 Using the Java Document Generator 

Size: 23.45KB MIME type: text/html 

Java programmers have become familiar with the online Java API documentation, 
available from Sun and mirrored elsewhere [8]. This professional looking style of code 
documentation is created semi-automatically from commented source code by the 
javadoc tool, which comes standard with the Java Software Development Kit (SDK). 
Javadoc creates HTML pages of documentation for programs or custom API, as defined 
in Java source code files. Javadoc parses the raw source code files looking for 
Javadoc ... 

7 DL94: Navigating and Searching in Hierarchical Digital Library Catalogs 

Size: 20.49KB MIME type: text/html 

The Subject Hierarchy list displays the hierarchy nodes above the books currently being 
displayed on the Book Shelf. When the user clicks on a book title, a Book Display 
widget is opened showing the full record for that book. Following a search the user can 
step forward and backward to the next matched book with the Up_Book and 
Down_Book buttons. 

8 Document Analysis Systems 2004 

Size: 32.97KB MIME type: text/html 

Henry S. Baird, Venugopal Govindaraju, Daniel P. Lopresti: Document Analysis Systems 
for Digital Libraries: Challenges and Opportunities. Yann Leydier, Frank Le Bourgeois, 
Hubert Emptoz: Serialized k-Means for Adaptative Color Image Segmentation: 
Application to Document Images and Others. Koichi Kise, Shota Fukushima, Keinosuke 
Matsumoto: Document Image Retrieval In a Question Answering System for Document 
Images. 

9 http://acm.org/sigs/sigkdd/explorations/issues/1-2-2000-Q1/chakrabarti.pdf 

Size: 238.00KB MIME type: application/pdf 

Outline: In this survey we will concentrate on statistical techniques for learning 
structure in various forms from text hypertext and semistructured data Models In x 2 
we describe some of the models used to rep resent hypertext and semistructured data 
Supervised learning In x 3 we discuss techniques for su pervised learning or classlcation 
Unsupervised learning The other end of the spectrum unsupervised learning or 
clustering is discussed in x 4 Semisupervised learning Reallife applications ... 

10 ICDAR1999 

Size: 127.99KB MIME type: text/html 

Vincenzo Di Lecce, Giovanni Dimauro, Andrea Guerriero, Giovanni Impedovo, Giuseppe 
Pirlo, A. Saizo: Electronic Document Image Resizing. Minako Sawaki, Hiroshi Murase, 
Norihiro Hagita: Character Recognition in Bookshelf Images using Context-based Image 
Templates. Masayoshi Okamoto, Kazuhiko Yamamoto: On-line Handwritten Character 
Recognition Method using Directional Features and Clockwise/Counterwise Direction- 
Change Features. 
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11 Document Analysis Systems 2002 

Size: 36.09KB MIME type: text/html 

Document Analysis Systems V, 5th International Worl<shop, DAS 2002, Princeton, NY, 
USA, August 19-21, 2002, Proceedings. Yue Lu, Chew Urn Tan: Word Searching in 
Document Images Using Word Portion Matching. Artem Mikheev, Luc Vincent, Mil<e 
Hawrylycz, Leon Bottou: Electronic Document Publishing Using DjVu. 

12 ICDAR 2003 

Size: 132.01KB MIME type: text/html 

Lionel Prevost, C. Michel-Sendis, A. Moises, Loic Oudot, Maurice Milgram: Combining 
model-based and discriminative classifiers : application to handwritten character 
recognition. SelichI Uchlda, Hiroaki Sakoe: Handwritten character recognition using 
elastic matching based on a class-dependent deformation model. Hiromitsu Nishimura, 
Takehiko Timikawa: Off-line. Character Recognition using On-line Character Writing 
Information. 

13 ICDAR 2001 

Size: 151.97KB MIME type: text/html 

Christophe Choisy, Abdel Belaid: Handwriting Recognition Using Local Methods for 
Normalization and Global Methods for Recognition. Urs-Viktor Marti, Horst Bunke: Text 
Line Segmentation and Word Recognition in a System for General Writer Independent 
Handwriting Recognition. Bertrand Couasnon: DMOS: A Generic Document Recognition 
Method, Application to an Automatic Generator of Musical Scores, Mathematical 
Formulae and Table Structures Recognition Systems. 

1^ ACM Symposium on Document Engineering 2003 
Size: 22.89KB MIME type: text/html 

Proceedings of the 2003 ACM Symposium on Document Engineering, Grenoble, France, 
November 20-22, 2003. Paula Leinonen: Automating XML document structure 
transformations. Jan Scheffczyk, Uwe M. Borghoff, Peter Rodig, Lothar Schmitz: 
Consistent document engineering: formalizing type-safe consistency rules for 
heterogeneous repositories. 

15 http://info.acm.org/crossroads/xrds10-2/chaHenge.html 

Size: 29.37KB MIME type: text/html 

The three most common approaches have focused on information-extraction, 
information-categorization, and information-retrieval. For example, information- 
extraction examines the semantics of a document, whereas information-categorization 
considers the way the document is subdivided. Information retrieval techniques look 
into ways to retrieve relevant Information from the collection of documents efficiently 
and effectively. 

16 uic-final.doc 

Size: 39.89KB MIME type: application/pdf 

This raises the accuracy of retrieval to beyond 95%. Some representative publications 
are [3][ 4]. 3. Scalable Content based Indexing and Retrieval for Audio and Image Data 
Archives (Ashfaq Khokhar and colleagues ) [1][ 2][ 10] Research in the area of content- 
based indexing and retrieval (CBIR) has primarily focused on indexing and retrieval of 
visual information, such as image and video data. Furthermore, content -based audio 
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indexing will also improve efficiency and accuracy of ... 


^7 http://www.siqir.org/sigirlist/i$sues/2003/2003-05.txt 
Size: 90.38KB MIME type: text/plain 

JCDL encompasses the many meanings of the term "digital libraries", including (but not 
limited to) new forms of information institutions; operational information systems with 
all types of digital content; new means of selecting, collecting, organizing, and 
distributing digital content; digital preservation and archiving; and theoretical models of 
information media, including document genres and electronic publishing. IEEE ICDM 
Best Paper Awards will be conferred at the conference on the ... 


18 SIGCHI Bulletin VoL30 No.2. April 1998: An Architecture for Content Analysis of 
Documents and its Use in Information and Knowledge Management Tasks 
Size: 46.47KB MIME type: text/html 

In particular, language-related work at ATG facilitates a number of information 
management tasks, including: semantic highlighting and indexing, topic identification 
and tracking, content analysis and abstraction, document characterisation, and partial 
document understanding. The SCOOP architecture has been configured to make use of 
the following components for salience-based content characterisation: discourse 
segmentation; phrasal analysis (of nominal expressions and relations); anaphora ... 


1S Modeling of Time and Document Aging for Reqeust Proediction 
Size: 32.43KB MIME type: text/html 

A With regard to our former prediction scenario, we will focus on document requests 
without considering either the order of requests or multiple requests per session [9]. 
Thus, a user-session can be regarded as a binary vector. As a means of measuring the 
accuracy of the prediction algorithm we proposed the prediction quality PQ as quotient 
of correctly predicted and wrongly predicted values. The modeling of time and 
document aging by using the time factor TF resulted in an improved ... 


http://www,siqmod.org/sigmod/record/issues/Q203/SPECIALy8.quix.pdf 
Size: 63.36KB MIME type: application/pdf 

Business Data Management for Business- to- Business Electronic Commerce Christoph 
Quix Informatik V, RWTH Aachen, Germany quix@ cs. 3. INTEGRATION AND 
CLASSIFICATION OF PRODUCT DATA A basic functionality of a business data repository 
for an electronic marketplace is the integration of external data sources to provide 
access to product databases and company profiles. This diversity creates a problem for 
product data representation in a business data repository [8]. Existing business- to- ... 
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